55 research outputs found

    Predicting large scale fine grain energy consumption

    Get PDF
    Today a large volume of energy-related data have been continuously collected. Extracting actionable knowledge from such data is a multi-step process that opens up a variety of interesting and novel research issues across two domains: energy and computer science. The computer science aim is to provide energy scientists with cutting-edge and scalable engines to effectively support them in their daily research activities. This paper presents SPEC, a scalable and distributed predictor of fine grain energy consumption in buildings. SPEC exploits a data stream methodology analysis over a sliding time window to train a prediction model tailored to each building. The building model is then exploited to predict the upcoming energy consumption at a time instant in the near future. SPEC currently integrates the artificial neural networks technique and the random forest regression algorithm. The SPEC methodology exploits the computational advantages of distributed computing frameworks as the current implementation runs on Spark. As a case study, real data of thermal energy consumption collected in a major city have been exploited to preliminarily assess the SPEC accuracy. The initial results are promising and represent a first step towards predicting fine grain energy consumption over a sliding time window

    Cinematographic Shot Classification with Deep Ensemble Learning

    Get PDF
    Cinematographic shot classification assigns a category to each shot either on the basis of the field size or on the movement performed by the camera. In this work, we focus on the camera field of view, which is determined by the portion of the subject and of the environment shown in the field of view of the camera. The automation of this task can help freelancers and studios belonging to the visual creative field in their daily activities. In our study, we took into account eight classes of film shots: long shot, medium shot, full figure, american shot, half figure, half torso, close up and extreme close up. The cinematographic shot classification is a complex task, so we combined state-of-the-art techniques to deal with it. Specifically, we finetuned three separated VGG-16 models and combined their predictions in order to obtain better performances by exploiting the stacking learning technique. Experimental results demonstrate the effectiveness of the proposed approach in performing the classification task with good accuracy. Our method was able to achieve 77% accuracy without relying on data augmentation techniques. We also evaluated our approach in terms of f1 score, precision, and recall and we showed confusion matrices to show that most of our misclassified samples belonged to a neighboring class

    Forecasting Li-ion battery State of Charge using Long-Short-Term-Memory network

    Get PDF
    Estimating the state of charge (SOC) for lithium-ion batteries (LIB) has become a highly desirable task, but also critical, especially as electrified vehicles become more common. However, due to the non-linear behaviour of these batteries, accurately estimating SOC remains a challenge. As a result, traditional theory-based methods are often being replaced by data-driven approaches, thanks to the greater availability of battery data and advances in artificial intelligence. Recurrent neural networks (RNNs), in particular, are promising methods to be exploited, because they can capture temporal dependencies and predict SOC without a battery model. Long short term memory (LSTM), a specific type of RNN, can accurately predict SOC values in real-time and forecast future SOC values within different time horizons

    Prompting the data transformation activities for cluster analysis on collections of documents

    Get PDF
    In this work we argue towards a new self-learning engine able to suggest to the analyst good transformation methods and weighting schemas for a given data collection. This new generation of systems, named SELF-DATA (SELF-learning DAta TrAnsformation) relies on an engine capable of exploring different data weighting schemas (e.g., normalized term frequencies, logarithmic entropy) and data transformation methods (e.g., PCA, LSI) before applying a given data mining algorithm (e.g., cluster analysis), evaluating and comparing solutions through different quality indices (e.g., weighted Silhouette), and presenting the 3-top solutions to the analyst. SELF-DATA will also include a knowledge database storing results of experiments on previously processed datasets, and a classification algorithm trained on the knowledge base content to forecast the best methods for future analyses. SELF-DATA’s current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. The preliminary validation performed on 4 collections of documents highlights that the TF-IDF and logarithmic entropy weighting methods are effective to measure item relevance with sparse datasets, and the LSI method outperforms PCA in the presence of a larger feature domain

    Data-Driven Estimation of Heavy-Truck Residual Value at the Buy-Back

    Get PDF
    In a context of deep transformation of the entire automotive industry, starting from pervasive and native connectivity, commercial vehicles (heavy, light, and buses) are generating and transmitting much more data than passenger cars, with a much higher expected value, motivated by the higher costs of the vehicles and their added-value related businesses, such as logistics, freight, and transportation management. This paper presents a data-driven and unsupervised methodology to provide a descriptive model assessing the residual value estimates of heavy trucks subject to buy-back. A huge amount of telematics data characterizing the actual usage of commercial vehicles is jointly analyzed with different external conditions (e.g., altimetry), affecting the truck's performance to estimate the devaluation of the vehicle at the buy-back. The proposed approach has been evaluated on a large set of real-world heavy trucks to demonstrate its effectiveness in correctly assessing the real status of wear and residual value at the end of leasing contracts, to provide a few and quantitative insights through an informative, interactive and user-friendly dashboard to make a proper decision on the next business strategies to be adopted. The proposed solution has already been deployed by a private company within its data analytics services since (1) an interpretable descriptive model of the main factors/parameters and corresponding weights affecting the residual value is provided and (2) the experimental results confirmed the promising outcomes of the proposed data-driven methodology

    NEMICO: Mining network data through cloud-based data mining techniques

    Get PDF
    Thanks to the rapid advances in Internet-based applications, data acquisition and storage technologies, petabyte-sized network data collections are becoming more and more common, thus prompting the need for scalable data analysis solutions. By leveraging today’s ubiquitous many-core computer architectures and the increasingly popular cloud computing paradigm, the applicability of data mining algorithms to these large volumes of network data can be scaled up to gain interesting insights. This paper proposes NEMICO, a comprehensive Big Data mining system targeted to network traffic flow analyses (e.g., traffic flow characterization, anomaly detection, multiplelevel pattern mining). NEMICO comprises new approaches that contribute to a paradigm-shift in distributed data mining by addressing most challenging issues related to Big Data, such as data sparsity, horizontal scaling, and parallel computation

    A data-driven energy platform: from energy performance certificates to human-readable knowledge through dynamic high-resolution geospatial maps

    Get PDF
    The energy performance certificate (EPC) is a document that certifies the average annual energy consumption of a building in standard conditions and allows it to be classified within a so-called energy class. In a period such as this, when greenhouse gas emissions are of considerable importance and where the objective is to improve energy security and reduce energy costs in our cities, energy certification has a key role to play. The proposed work aims to model and characterize residential buildings’ energy efficiency by exploring heterogeneous, geo-referenced data with different spatial and temporal granularity. The paper presents TUCANA (TUrin Certificates ANAlysis), an innovative data mining engine able to cover the whole analytics workflow for the analysis of the energy performance certificates, including cluster analysis and a model generalization step based on a novel spatial constrained K-NN, able to automatically characterize a broad set of buildings distributed across a major city and predict different energy-related features for new unseen buildings. The energy certificates analyzed in this work have been issued by the Piedmont Region (a northwest region of Italy) through open data. The results obtained on a large dataset are displayed in novel, dynamic, and interactive geospatial maps that can be consulted on a web application integrated into the system. The visualization tool provides transparent and human-readable knowledge to various stakeholders, thus supporting the decision-making process
    • …
    corecore